A Statistical Approach to Machine Aided Translation of Terminology Banks

نویسندگان

  • Jyun-Sheng Chang
  • Andrew Chang
  • Tsuey-Fen Lin
  • Sur-Jin Ker
چکیده

"l]fis paper reports on a new statistical approach to machine aided translation of terminology bank. The text in the bank is hyphenated and then dissected into roots of 1 to 3 syllables. Both hyphenation and dissection are done with a set of initial probabilities of syllables and roots. The probabilities are repeatedly revised using an EM algorithm. Alter each iteration of hyphenation or dissectioh, the resulting syllables and roots are counted subsequently to yield more precise estimation of probability. The set of roots rapidly converges to a set of most likely roots. Preliminary experhuents have shown promising results. From a terminology bank of more than 4,000 terms, the algorithm extracts 223 general and chemical roots, of which 91% are actually roots. The algoritlun dissects a word into roots with aromld 86% hit rate. The set of roots and their "hand-translation are then used iu a compositional translation of the terminology bank. One can expect the translation of terminology bank using this approach to be more cost-effective, consistent, and with a better closure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhancing Statistical Machine Translation with Bilingual Terminology in a CAT Environment

In this paper, we address the problem of extracting and integrating bilingual terminology into a Statistical Machine Translation (SMT) system for a Computer Aided Translation (CAT) tool scenario. We develop a framework that, taking as input a small amount of parallel in-domain data, gathers domain-specific bilingual terms and injects them in an SMT system to enhance the translation productivity...

متن کامل

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

Post-MT Term Swapper: Supplementing a Statistical Machine Translation System with a User Dictionary

A statistical machine translation (SMT) system requires homogeneous training data in order to get domain-sensitive (or context-sensitive) terminology translations. If the data consists of various domains, it is difficult for an SMT system to learn context-sensitive terminology mappings probabilistically. Yet, terminology translation accuracy is an important issue for MT users. This paper explor...

متن کامل

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

From Statistical Term Extraction to Hybrid Machine Translation

This study presents a new hybrid approach for translation equivalent selection within a transfer-based machine translation system using an intertwined net of traditional linguistic methods together with statistical techniques. Detailed evaluation reveals that the translation quality can be improved substantially in this way.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992